It is crucial to evaluate the quality and determine the optimal number of clusters in cluster analysis. In this paper, the multi-granularity characterization of the data set is carried out to obtain the hyper-balls. The cluster internal evaluation index based on hyper-balls(HCVI) is defined. Moreover, a general method for determining the optimal number of clusters based on HCVI is proposed. The proposed methods can evaluate the clustering results produced by the several classic methods and determine the optimal cluster number for data sets containing noises and clusters with arbitrary shapes. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
translated by 谷歌翻译
最近的研究表明,深度神经网络(DNN)容易受到后门攻击的影响。感染的模型正常在良性输入上行为,而其预测将被迫对对抗数据进行攻击特定目标。已经开发了几种检测方法来区分投入来防御这种攻击。这些防御依赖的常见假设是受感染模型提取的清洁和对抗进口的潜在表示之间存在大的统计差异。但是,虽然缺乏假设是真实的重要性,但缺乏全面的研究。在本文中,我们专注于它并研究以下相关问题:1)统计差异的性质是什么? 2)如何在不损害攻击强度的情况下有效减少它们? 3)这种减少对基于差异的防御有何影响?我们的工作是在三个问题上进行的。首先,通过引入最大平均差异(MMD)作为指标,我们确定多级表示的统计差异都是大的,而不仅仅是最高级别。然后,我们通过在训练后门模型期间向损耗功能添加多级MMD约束来提出统计差异减少方法(SDRM),以有效降低差异。最后,检查了三种典型的基于差异的检测方法。这些防御的F1分数下降到定期训练的后门型号的90%-100%,在所有两个数据集,四个模型架构和四种攻击方法上用SDRM培训的型号上的60%-70%。结果表明,所提出的方法可用于增强现有攻击以逃避后门检测算法。
translated by 谷歌翻译
许多研究表明,深度神经网络很容易被对抗例误导。有效地评估模型的对抗性稳健性对于其在实际应用中的部署方面是重要的。目前,一种常见的评估类型是通过构建恶意实例和执行攻击来近似模型作为稳健性指标的对抗风险。不幸的是,近似值与真值之间存在错误(间隙)。以前的研究手动设计攻击方法以实现更小的错误,这是效率低下,可能会错过更好的解决方案。在本文中,我们建立了近似误差的收紧作为优化问题,并尝试用算法解决它。更具体地说,首先分析用替代损失替换非凸面和不连续的0-1损失,在计算近似时需要必要的妥协,是错误的主要原因之一。然后我们提出Autoloss-AR,这是搜索损失功能的第一种方法,用于收紧对抗风险的近似误差。广泛的实验是在多种设置中进行的。结果证明了该方法的有效性:最佳发现的损失函数分别在Mnist和CiFar-10上占据了手工基线的0.9%-2.9%和0.7%-2.0%。此外,我们还验证搜索后的损失可以转移到其他设置,并探索它们通过可视化本地丢失景观来探索它们优于基线。
translated by 谷歌翻译
生成的对抗网络(GANS)能够生成从真实图像视觉无法区分的图像。然而,最近的研究表明,生成和实际图像在频域中共享显着差异。在本文中,我们探讨了高频分量在GAN训练中的影响。根据我们的观察,在大多数GAN的培训期间,严重的高频差异使鉴别器聚焦在过度高频成分上,阻碍了发电机拟合了对学习图像内容很重要的低频分量。然后,我们提出了两个简单但有效的频率操作,以消除由GAN训练的高频差异引起的副作用:高频混淆(HFC)和高频滤波器(HFF)。拟议的操作是一般的,可以应用于大多数现有的GAN,一小部分成本。在多丢失函数,网络架构和数据集中验证了所提出的操作的高级性能。具体而言,拟议的HFF在Celeba(128 * 128)基于SSNGAN的Celeba无条件生成的Celeba(128 * 128)无条件一代,在Celeba无条件一代基于SSGAN的13.2 \%$ 30.2 \%$ 69.3 \%$ 69.3 \%$ FID在Celeba无条件一代基于Infomaxgan。
translated by 谷歌翻译
生成对抗网络(GAN)是最受欢迎的图像生成模型,在各种计算机视觉任务上取得了显着进度。但是,训练不稳定仍然是所有基于GAN的算法的开放问题之一。已经提出了许多方法来稳定gan的训练,其重点分别放在损失功能,正则化和归一化技术,训练算法和模型体系结构上。与上述方法不同,在本文中,提出了有关稳定gan训练的新观点。发现有时发电机产生的图像在训练过程中像歧视者的对抗示例一样,这可能是导致gan不稳定训练的原因的一部分。有了这一发现,我们提出了直接的对抗训练(DAT)方法来稳定gan的训练过程。此外,我们证明DAT方法能够适应歧视器的Lipschitz常数。 DAT的高级性能在多个损失功能,网络体系结构,超参数和数据集上进行了验证。具体而言,基于SSGAN的CIFAR-100无条件生成,DAT在CIFAR-100的无条件生成上实现了11.5%的FID,基于SSGAN的STL-10无条件生成的FID和基于SSGAN的LSUN卧室无条件生成的13.2%FID。代码将在https://github.com/iceli1007/dat-gan上找到
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
Recent studies have shown that using an external Language Model (LM) benefits the end-to-end Automatic Speech Recognition (ASR). However, predicting tokens that appear less frequently in the training set is still quite challenging. The long-tail prediction problems have been widely studied in many applications, but only been addressed by a few studies for ASR and LMs. In this paper, we propose a new memory augmented lookup dictionary based Transformer architecture for LM. The newly introduced lookup dictionary incorporates rich contextual information in training set, which is vital to correctly predict long-tail tokens. With intensive experiments on Chinese and English data sets, our proposed method is proved to outperform the baseline Transformer LM by a great margin on both word/character error rate and tail tokens error rate. This is achieved without impact on the decoding efficiency. Overall, we demonstrate the effectiveness of our proposed method in boosting the ASR decoding performance, especially for long-tail tokens.
translated by 谷歌翻译
Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.
translated by 谷歌翻译
This paper presents a novel framework for planning in unknown and occluded urban spaces. We specifically focus on turns and intersections where occlusions significantly impact navigability. Our approach uses an inpainting model to fill in a sparse, occluded, semantic lidar point cloud and plans dynamically feasible paths for a vehicle to traverse through the open and inpainted spaces. We demonstrate our approach using a car's lidar data with real-time occlusions, and show that by inpainting occluded areas, we can plan longer paths, with more turn options compared to without inpainting; in addition, our approach more closely follows paths derived from a planner with no occlusions (called the ground truth) compared to other state of the art approaches.
translated by 谷歌翻译